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Image descriptors techniques. Finally, we use the final feature vector as an input to a radial 
Image restoration basis function-based support vector machine classifier (RbfSVM) for 
Pedestrian detection pedestrian recognition. Experiments are performed on the daimler pedestrian 
Weather conditions classification benchmark dataset. Results show that the area under the curve 


(AUC) and the detection rate of our model are less affected by weather 
conditions compared to other common models like histogram of oriented 
gradients (HOG) and gabor filter bank (GFB) detectors. 
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1. INTRODUCTION 

Traffic accidents in Morocco cause more than 4,000 deaths each year, 25% are pedestrians. Hence, 
in the last few years, the considerable interest in building pedestrian detection systems has been very active to 
improve traffic safety by providing drivers some information concerning their environment, any potential 
hazard, and performing counteractive measures in dangerous situations. Two of the most significant tasks are 
the recognition and localization of pedestrians in front of a vehicle. 

The pedestrian detection task is particularly difficult for several reasons. For the most part, there is a 
wide range of possible pedestrian appearances due to changing articulated poses, clothing, lighting 
conditions, and cluttered backgrounds. In addition, detecting pedestrians can be more difficult with poor 
visibility due to the presence of fog or mist in the atmosphere. Images taken under such bad weather 
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conditions are subject to degradation and extreme loss of contrast. To obtain the image with good visibility 
from the original bad one, many researchers try to solve the problem with other additional information or 
only the degradation image [1]-[4]. 

The main work of this paper is to develop a strong pedestrian recognition system against weather 
conditions without applying any image restoration technique. Object contours contain a rich source of 
information for the discernment of shape and for the detection of pedestrians [5]. Contour cues can be 
estimated using multiple image processing techniques such as edge detection [6] and image segmentation 
[7]-[10]. Our new proposed detection framework focuses on locating contours cues by applying different 
edge filters on images and extracting features using different descriptors such as: census transform (CT) [11], 
modified census transform (MCT) [12] and local gradient pattern (LGP) [13], which can characterize critical 
contour information for pedestrian detection [5]. Our objectives are to show that the critical information in 
our proposed features can lead to efficient detection against weather effects and to prove the importance of 
using degradation data for model evaluation. 

The rest of this paper is organized: section 2 defines the materials and methods we will address. 
Section 3 introduces our proposed recognition models. After presenting and discussing our test results in 
section 4, we conclude in section 5. 


2. MATERIALS AND METHODS 
2.1. Image degradation 

The degradation of images by fog and mist is a common problem that poses safety and efficiency 
issues to all transportation systems. In the atmospheric propagation studies, distributions of particles such as 
cloud, fog, haze, and mist are all known as atmospheric aerosols. The effects of such aerosols are the 
following: i) contrast degradation is due to the attenuation of the light caused by a cloud of solid or liquid 
particles in the air and/or the dispersion of some direct light flux toward the camera; ii) blurring can be 
caused by many factors such as long-term atmospheric turbulence, defocusing, and movement during the 
capture process, and iii) noise is a random variation of brightness or color information in images that comes 
from sensors like thermal or electrical signals and environmental conditions such as rain and snow. 


2.2. Edge detection 

Edge detection is a regular first step in extracting information from images. It continues to be an 
active area of research due to its importance in reducing the data to be processed and filtering out information 
that may be considered less important while conserving the important structural characteristics of an image. 
In edge detection, we locate the boundaries or edges of objects in an image, by determining where the 
brightness of the image varies significantly. Edge detectors use typically two convolution masks, one 
calculating gradient in the x-direction, and another calculating gradient in the y-direction. This operation can 
be described by (1): 


G, (i,j) = mask, * (i,j), Gy (ij) = masky * 1G,j) (1) 


where I(i,j) is indicating some neighborhood of pixel (i,j) of the input image, and * denotes convolution. 
The gradient magnitude is estimated by combining the two individual images G, and Gy, using the 


approximation equation G, = |G, + Gy and rescaled to [0.255] by (2): 


Gchanged = 255. Emin (2) 


Gmax-Gmin 


Edge filters used in this paper are the following: 
a) Sobel: uses two 3 by 3 kernels to detect gradients in the horizontal and the vertical directions. 


—-1 0 +1 —-1 -2 -1 
mask, =|—2 0 +42 ,mask,y =] 0 0 0 (3) 
—-1 0 +1 +1 42 +41 
b) Roberts: uses two 2 by 2 kernels to measure gradients in opposing diagonal directions. 
_ft+l 0 _foO +1 
mask, = | 0 2 | masky = [es 0 | (4) 
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c) Prewitt: uses two 3 by 3 kernels to measure horizontal and vertical gradients. 


-—1 0 +1 -1 -1 -1 
mask, =|—1 0 +1],masky =| 0 0 0 (5) 
—1 0 +1 +1 41 +1 


d) Local variance: local statistics can establish the gradient of the image. The local variance can be used to 
produce the edge map. Our modified edge detection based on local variance can be outlined: 

— Read the image as grayscale 

— Define a window size (3,3) 


¢fi. a 4 
k=2-]1 1 1 
111 


— Calculate the local variance Lyg, = k * (17) — (k * I)”, where * denotes convolution 
— Compute the global median of the local variance image Ly, 
— Set local variance to 255 if it is strictly less than the global median else set it to zero 


2.3. Image descriptors 

Image descriptors are the topic of image processing, which require the use of algorithms to compute 
and extract certain attributes to represent the image. Image descriptors, for example, describe basic 
characteristics such as shape, color, texture, or motion. Image descriptors used in this paper are the following: 


2.3.1. Census transform (CT) 

CT is a non-parametric local transform that was first proposed by Zabih and Woodfill [11]. CT 
compares the intensity value of the center pixel with its eight neighboring pixels to obtain an 8 binary string, 
which is converted into a decimal number between 0 and 255. The CT value for the pixel (x,, y,) is given by 


(6): 


x>0 
otherwise 


CT (X¢, Vc) = eb S(ip = ia) . 2P, s(x) — 3 ©) 


where i, is the gray value of the central pixel, i, is the value of its neighbors. The most important properties 
of CT are its tolerance against illumination changes, gamma variations, and computational simplicity. 


2.3.2. Modified census transform (MCT) 

MCT was proposed by Froba and Ernst [12] in the context of texture analysis. MCT is a variation of 
CT where the measure incorporates the mean intensity of the neighborhood (including the center pixel). Let 
the mean intensity be denoted as 1,. Therefore it can be calculated as: 


c= ft ‘ 
ic = 5 Ln=0 ip (7) 
Then the MCT value for the pixel (x,, y,) is given by (8): 


1, x20 


MCT (Xe,¥e) = Lp=o S(ip — He) 2°, s(x) = {0 otherwise ”) 


2.3.3. Local gradient pattern (LGP) 

LGP is an amelioration over the traditional MCT, proposed by Jun and Kim[13]. LGP emphasizes 
the local variation in the neighborhood by incorporating the intensity gradient profile of the neighborhood in 
the measure. The average gradient is given by (9): 


a Tt 3 
J = 5 &b=0 Ip» Gp = lip — ic| (9) 


Then the LGP value for the pixel (x,,y,) is given by (10): 
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x>0 


otherwise (10) 


Zs 1, 
LGP(%er¥e) = Lp=05(Ip — J) *2?, s(x) = {p' 


2.4. Feature selection (FS) 

In real-world concept learning problems, data representation frequently employs a large number of 
features, only a subset of which may be related to the target concept. FS methods [14], [15] involve selecting 
input variables that have the strongest correlation with the target variable and removing irrelevant, 
insignificant, and unimportant features to achieve higher accuracy. These are the main reasons to use FS: i) 
Simplifying data visualization and data understanding, ii) Reducing storage requirements, iii) Improving the 
accuracy of classifiers, iv) Decreasing over-fitting, and v) Reducing training and prediction time of classifiers 

There are many FS techniques, we are particularly interested in the SelectFromModel meta 
transformer, which is from sklearn.feature_selection module [16], it is a function that can be used with any 
classifier having feature_importances_ attribute after fitting. The features are considered irrelevant and 
excluded if the corresponding feature importances values are below the threshold parameter provided. In this 
paper, we use linear support vector machine (SVM) [17] to compute feature_importances_, which can then 
be used to discard irrelevant features by using the SelectFromModel function. 


3. PROPOSED RECOGNITION MODELS 

In order to study the importance of using degradation data for model evaluation, we propose two 
models shown in Figures 1 and 2. The first method is based on applying CT descriptor on the input image 
and extracting histograms from the resulting image by dividing it into 27 (9,3) blocks with size 6x4. 
Histograms for all blocks are concatenated. Final feature vectors are normalized (L2) and trained using radial 
basis function-based support vector machine classifier (RbfSVM) [17]. The second method is the same as the 
first only we use CT based on uniform pattern (CTU) instead of CT descriptor. Patterns with most two 
circular 0-1 and 1-0 transitions are referred to as uniform. For example, patterns 11111111, 00000000, 
00011100, and 11111011 are uniform, and patterns 01010000, 01001110, or 10101100 are not uniform. 
When we extract histograms, each uniform pattern has its own bin, and all non-uniform patterns are assigned 
to a single bin. 


Image Extracting histograms rediction 
and applving L2 RbfSVM 
normalization to 

samples 


Figure 1. CT+RbfSVM model 


. . . _ 11 
Image Extracting histograms Prediction 


and applying L2 


normalization to 
samples 


Figure 2. CTU+RbfSVM model 


Another important objective is to reduce the effects of degradation on the recognition rate by 
proposing a new method based on applying different processing techniques to images especially edge filters 
and extracting CT-MCT-LGP features. Local histograms have been collected by dividing every image into 
27 (9,3) blocks with size 6x4. Histograms for all blocks are concatenated and all variables are scaled to [0,1] 
for several reasons, such as applying different descriptors, using distance-based classifier (SVM), and before 
measuring variables importance, so that all the features contribute equally to the results. We use the FS 
technique to reduce the length of the final feature vector from 26,055 to 9,878. Final feature vectors are 
classified using the RbfSVM classifier. The architecture of our model is described in Figure 3. Using uniform 
patterns, the length of CT and LGP feature vectors for a single block is reduced from 256 to 59, and from 512 
to 75 for the MCT feature vector. In our work, we are using uniform patterns only to reduce the length of 
feature vectors (histograms) and to increase the efficiency of classifiers in terms of speed. 
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Figure 3. Proposed model against weather conditions 


4. RESULTS AND DISCUSSION 

For our experiments, we use the Daimler mono pedestrian classification benchmark dataset [18] to 
determine the efficacy of different proposed methods. The classification dataset consists of five folders, three 
folders for training, and two folders for testing (T1, T2). Each folder contains a total of 4,800 pedestrian and 
5,000 non-pedestrian samples taken from video images and scaled to 18x36 common size. 

In our study, we use three different types of data: 

—  Normal-data: same as the original dataset. 

—  Blurred-data: we simulate the blurred effects on images by applying a Gaussian filter (3,3) (Kernel 
generated by cv2.GaussianBlur(image,(3,3),0) function [19]). 

—  Contrast-data: we simulate the contrast degradation by multiplying image pixels value by a factor of (0.3). 

During the model-building process, we use the k-fold cross-validation method [20]. Cross-validation 
techniques allows us to compare the performance of different machine learning models while selecting 
appropriate parameters. In our work K=3, folder 1 and folder 2 are used for training, and folder 3 is used for 
validation. 

For model-evaluation, we use the receiver operating characteristic curve (ROC) because it indicates 
the true positive rate (TPR) and the false positive rate (FPR) for each threshold not like accuracy just for 
threshold=0.5. We picked the best possible TPR and FPR (noted x in ROC curve) using G-mean score, and 
we used the area under the curve (AUC) because it tells how much our models are capable of distinguishing 
between classes (pos/neg samples), it is also used as a summary of the ROC curve. Models are executed on 
Windows7 OS and python integrated development environment (IDLE) v2.7.15 using a computer with Intel 
Xeon CPU E3-1226 v3 Quard-core 3.3 GHz and 12Go of RAM. 

In order to study the importance of using degradation data for model evaluation, we train our two 
models shown in Figures 1 and 2. The results are presented in Figures 4 and 5, Tables | and 2 respectively. 
CT and CTU models show comparable results (AUC, TPR, FPR) for normal images but large differences for 
degraded images shown in Table 1 and Table 2, implying that model evaluation should include both normal 
and degraded images. 

To demonstrate the effectiveness of our proposed model as seen in Figure 3 against weather 
degradation, we compared it with two common significant methods. The first method proposed by Dalal and 
Triggs [21] based on histogram of oriented gradients (HOG) features inspired on scale invariant feature 
transform (SIFT) [22] and SVM learning machine. HOG characteristics model the form and appearance 
using normalized histograms of the image gradient orientation. The idea is to divide the image into small 
regions called cells. A cell is described as a histogram of its local gradients binned by orientation and 
weighted by magnitude. These cells are clustered together in larger areas called blocks. A block is defined as 
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a feature vector generated by concatenated and normalized histograms of its cells. The feature vectors for all 
blocks are concatenated to produce the final feature vector. All final feature vectors are classified using a 
linear SVM. 


v —* ctrbfsvm on valset 
é —*-— ctrbfsvm on blurred valset 
8 —»— ctrbfsvm on contrast valset 
g —@ ctrbfsvm on testsetl 
E —@® ctrbfsvm on blurred testset1 
—®- ctrbfsvm on contrast testset1 
— ctrbfsvm on testset2 
—®- ctrbfsvm on blurred testset2 
—— ctrbfsvm on contrast testset2 
x best rate for each curve 
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 
False Positive Rate 
Figure 4. CT+ RbfSVM ROC 
2 
e 
v cturbfsvm on valset 
El cturbfsvm on blurred valset 
£ cturbfsvm on contrast valset 
4 cturbfsvm on testsetl 
FE cturbfsvm on blurred testset1 
cturbfsvm on contrast testset1 
cturbfsvm on testset2 
cturbfsvm on blurred testset2 
cturbfsvm on contrast testset2 
best rate for each curve 
0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 
False Positive Rate 
Figure 5. CTU+RbfSVM ROC 
Table 1. CT+RbfSVM performance Table 2. CTU+RbfSVM performance 
Data Degradation AUC TPR(%) FPR(%) Data Degradation AUC TPR(%) FPR(%) 
effects effects 
Val None 0.998366 98.0833 1.7800 Val None 0.998315 97.8750 1.6 
Blurred 0.991940 95.5625 4.0600 Blurred 0.974226 93.6250 9.3200 
Contrast 0.995720 96.8333 3.1400 Contrast 0.991852 95.9583 4.74 
Test1 None 0.971521 90.5 8.5 Test1 None 0.970978 91.8125 9.86 
Blurred 0.955368 88.6042 10.46 Blurred 0.930177 88.5 16.38 
Contrast 0.972525 89.5625 7.46 Contrast 0.967501 88.75 8.7 
Test2 None 0.982416 92.7292 6.44 Test2 None 98.2003 93.7917 8 
Blurred 0.963774 89.5208 10.04 Blurred 92.6765 86.4375 16.68 
Contrast 0.980687 92.8125 7.46 Contrast 97.7381 93.25 8.9 


From our training results, it appears that fine binning (9 orientation) and large scale features (blocks 
of 2x2 cells of 2x2 pixels) are the best parameters for the Daimler dataset. Block histograms are normalized 
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using the LI technique and final feature vectors are normalized using the L2 technique. For better 
performance, we train final feature vectors using RbfSVM instead of linear SVM because of the small size of 
images on the dataset. 

The second method [23] is based on extracting features by applying gabor filter bank (GFB) and 
linear SVM for classification. Our best modified feature extractor that we can achieve for our dataset consists 
of three steps: convolution of the input image with multiple gabor filters, splitting the image into sub-images, 
and calculation of statistical feature values. We apply 120 filters to the input image, the size of each kernel is 
16x16. Gabor filters are generated by varying sigma to 0.001, 1.62377674e-03, 2.63665090e-03, 
4.28133240e-03, 6.95192796e-03, 1.12883789e-02, 1.83298071e-02, 2.97635144e-02, 4.83293024e-02, 
7.84759970e-02, 1.27427499e-01, 2.06913808e-01, 3.35981829e-01, 5.45559478e-01, 8.85866790e-01, 
1.43844989, 2.33572147, 3.79269019, 6.15848211, 10 (generated by numpy.logspace(-3,1,num=20) function 
[24], [25]) and theta (0 to 5.2/6 by 1/6 step), lambda is set to 1, and gamma to 0.001. 

The result 18x36 image is divided into 27 (9,3) sub-images with size 6x4. For each sub-image, three 
statistical features (mean, standard deviation, and skewness) are calculated. Consequently, the dimension of 
the feature vector is 9720 (20x6x27x3=9720). All variables of features vectors are scaled to [0,1] due to the 
different wide range of features (mean, standard deviation, and skewness), then we train finally vectors by 
using RbfSVM classifier. Figure 6, Table 3, Figure 7, Table 4, Figure 8, and Table 5 show the weather 
effects on our proposed model, HOG and GFB detectors respectively. 


proposed method on valset 
proposed method on blurred valset 
proposed method on contrast valset 
proposed method on testset1 
proposed method on blurred testset1 
proposed method on contrast testset1 
proposed method on testset2 
proposed method on blurred testset2 
proposed method on contrast testset2 
best rate for each curve 


0.4 


True Positive Rate 


0.2 
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Figure 6. Proposed method ROC 
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Figure 7. HOG+RbfSVM ROC 
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Table 3. Proposed method performance 


Data Degradation effects AUC TPR(%) FPR(%) 
Val None 0.998877 98.3542 1.54 
Blurred 0.995393 96.8958 3.28 
Contrast 0.998095 98.1875 2.24 
Test1 None 0.989015 94.9375 27 
Blurred 0.985071 94 6.26 
Contrast 0.989094 96.2708 6.82 
Test2 None 0.994067 96.5208 3.92 
Blurred 0.985453 93.9792 5.84 
Contrast 0.994187 95.6042 2.98 
Table 4. HOG+RbfSVM performance 
Data Degradation effects AUC TPR(%) FPR(%) 
Val None 0.991113 95.0208 4.16 
Blurred 0.978146 92.2292 7.14 
Contrast 0.988583 94.3750 4.76 
Test1 None 0.964186 88.6458 9.14 
Blurred 0.938031 85.5833 13.4 
Contrast 0.962219 89.25 10.16 
Test2 None 0.979831 92.0208 6.34 
Blurred 0.954836 88.6875 11.18 
Contrast 0.978818 91.7917 6.58 


589 


Our comparison reveals a number of interesting points, every image recognition technique has its 
own characteristics. The recognition rate of the GFB model gets influenced by contrast more than blur 
degradation. On the other hand, the HOG model is affected by blur more than contrast degradation, see 
Figure 7, Table 4, Figure 8, and Table 5. As a result, choosing appropriate image restoration techniques for 
better model performance depends on how models react to degradation effects. 


True Positive Rate 


“toteetrt 


gfbrbfsvm on valset 
gfbrbfsvm on blurred valset 
gfbrbfsvm on contrast valset 
gfbrbfsvm on testset1 
gfbrbfsvm on blurred testset1 
gfbrbfsvm on contrast testset1 
gfbrbfsvm on testset2 
gfbrbfsvm on blurred testset2 
gfbrbfsvm on contrast testset2 
best rate for each curve 


0.00 


T T 
0.02 0.04 


T 
0.08 


T T T 
0.10 0.12 0.14 


False Positive Rate 


Figure 8. GFB+RbfSVM ROC 


Table5. GFB+RbfSVM performance 


Data __ Degradation effects AUC TPR(%) FPR(%) 
Val None 0.993526 96.7292 4.32 
Blurred 0.988470 95.8333 5.82 
Contrast 0.954620 87 9.6 
Test1 None 0.975297 91.5833 7.74 
Blurred 0.967258 91.6667 9.44 
Contrast 0.961953 89.9792 10.3 
Test2 None 0.959085 87.9792 74 
Blurred 0.960343 87.5833 7.36 
Contrast 0.965632 89.1458 7.68 
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For validation samples, AUC of our proposed method against contrast degradation decreases by 
0.08%, HOG detector decreases by 0.26%, gabor detector decreases by 3.92%. For blurred effects, AUC of 
our proposed method decreases by 0.35%, HOG detector decreases by 1.31% and Gabor detector decreases 
by 0.51%. The execution time prediction of our models can be decomposed into two main elements: the 
extracting time of features of the input image and the prediction time of features by the classifier. Our 
proposed method against weather conditions requires around 0.2s to predict the input image, 0.04s for HOG 
detector, and 0.8s for GFB detector. We can conclude that the time prediction might seem a priori a weakness 
of our model due to the computational time of the multiple filters and descriptors applied but our model has 
the advantage of being less affected by degradation effects compared to other common models HOG and 
GFB detectors. Figure 9 shows some images from the T1 folder were resisted by our proposed model to 
weather conditions but not to (contrast or blurred) conditions by others models (HOG and GFB). Another 
important point to add is that low-resolution resized images lead to the loss of critical information. As a 
result, models trained with small-size images and tested on large images may cause false-positive predictions. 
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Figure 9. Images from the T1 folder were resisted by our proposed model to weather conditions but not to 
(contrast or blurred) conditions by other models (HOG and GFB) 


5. CONCLUSION 

In this paper, different types of pedestrian recognition systems are implemented and the results are 
compared by using the performance parameters AUC, TPR, FPR. Our proposed method based on extracting 
CT-MCT-LGP features from images by applying multiple edge filters shows good performance in terms of 
recognition rate and resistance to weather conditions compared to other common models like HOG and GFB 
detectors. The results are very promising, but there are still some perspectives for our future research, the first 
is the development of a pedestrian detection system using cascading classifiers and parallel computing 
technology (GPU acceleration) to achieve both a high detection rate and real-time detection. Next, we intend 
to use infrared (IR) cameras to detect pedestrians at night. Finally, up-scale low-resolution training and 
testing images to improve image details and increase the detection rate. 
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